Tenuous Relations and Timeseries Analysis

Nathaniel Forde

Data Science @ Personio

1

Agenda

  1. Motivation: Background Problem
  2. Structural Autoregressive Models
  3. Structural Autoregressive Models in PYMC
  4. Vector Autoregressive Models
  5. Bayesian Vector Autoregressive Models in PYMC
  6. Hierarchical BVARs in PYMC
2

Motivation: Background Problem

4

Disclaimers and Acknowledgements

1 2 3 4 5 6

Note

No Personio data used in this presentation. The problem discussed motivated the research, but the models presented below will use fake or public data.

We will be discussing the problem of App performance measurements and customer feedback metrics.

How do they influence one another and how to best distil the relationship between the two?

Note

Acknowledgements: The work on the BVAR discussed here took inspiration and borrowed heavily from the PYMC labs blogpost here. The example PRs benefited from the feedback of the PYMC devs.

Warning

Not an Economist!!!

5

Systems and Feedback: App Performance

1 2 3 4 5 6

“Balancing feedback loops are equlibrating or goal seeking structures in systems and are both sources of stability and resistence to change”

“Allowing performance standards to be influenced by past performance… sets up a reinforcing feedback of eroding goals that sets a system drifting toward low performance.”

6

Disentangling Individual effects

1 2 3 4 5 6

7

Identifying the Channel of Transmission

1 2 3 4 5 6

8

Lagging Effects in Time

1 2 3 4 5 6

9

Structural Autoregressive Models

11

Autogressive Models

1 2 3 4 5 6

  • Regression: The Linear Projection of Y onto X is the OLS solution.
  • The Wold Decomposition: If a random variable YtYt is covariance stationary then YtYt has a linear representation:
    Yt=μ+∑j=0∞bjet−jYt=μ+∑j=0∞bjet−j
  • Auto-regression: Extends this idea to Hilbert space of the infinite past history:
    • Yt=μ+∑j=1∞ajYt−j+etYt=μ+∑j=1∞ajYt−j+et
    • Where the error terms ee represent white-noise N(0,1)N(0,1)
  • “…this justifies linear models as approximations” - Hansen Econometrics

  • Most timeseries data is shorter than their infinite past.
12

Moving Average Representation and Impulse Response

1 2 3 4 5 6

  • The coefficients in: Yt=μ+∑∞j=0bjet−jYt=μ+∑j=0∞bjet−j are known as the impulse response function IRF representing the change in Yt+jYt+j due to a shock at time tt.

    • ∂∂etYt+j=bj∂∂etYt+j=bj
  • They can be recovered from the AR representation by recursive reconstruction.

  • IRFs are often reported when scaled by the standard deviation of shocks.

13

Stationarity Requirements

1 2 3 4 5 6

Tricks to achieve stationarity

Conditions required for Stationarity

14

Structural Autoregressive Models

1 2 3 4 5 6

  • Bayesian Structural Timeseries (BSTS) are often seen as an alternative to Auto-regressive models1
  • But BSTS models can also incorporate latent auto-regressive components.
  • Benefits of going Bayesian:
    • Transparent understanding of the sources of uncertainty within your model and a capacity to introduce outside information by way of priors or other covariates.
    • Plausible counterfactual inference with posterior predictive distributions
  1. Nice post by Kim Larsen here

15

Characterising Structure

1 2 3 4 5 6

Y∼TruncNorm(μ,σ,0,∞)μ=ARμ+f(seasonality)+f(trend)+...σ∼InverseGamma(3,.5)trend=α+β⋅Tiα∼Normal(7,1)β∼Normal(0.009,.1)seasonality=FourierFeatures′βfourierβfourier∼Normal(0,0.5)ARμ∼μar+a1⋅Yt−1+a2⋅Yt−2...+γY∼TruncNorm(μ,σ,0,∞)μ=ARμ+f(seasonality)+f(trend)+...σ∼InverseGamma(3,.5)trend=α+β⋅Tiα∼Normal(7,1)β∼Normal(0.009,.1)seasonality=FourierFeatures′βfourierβfourier∼Normal(0,0.5)ARμ∼μar+a1⋅Yt−1+a2⋅Yt−2...+γ

16

Structural Autoregressive Models in PYMC

18

Bayesian Structural Timeseries

1 2 3 4 5 6

  • Priors
  • Additive Structural Components
  • Likelihood

 with AR:
        ## Data containers to enable prediction
        t = pm.MutableData("t", t_data, dims="obs_id")
        y = pm.MutableData("y", ar_data, dims="obs_id")
        # AR priors: The first coefficient will be the intercept term
        coefs = pm.Normal("coefs", priors["coefs"]["mu"], priors["coefs"]["sigma"])
        sigma = pm.HalfNormal("sigma", priors["sigma"])
        ## Priors for the linear trend component
        alpha = pm.Normal("alpha", priors["alpha"]["mu"], priors["alpha"]["sigma"])
        beta = pm.Normal("beta", priors["beta"]["mu"], priors["beta"]["sigma"])
        ## Priors for seasonality
        beta_fourier = pm.Normal(
            "beta_fourier",
            mu=priors["beta_fourier"]["mu"],
            sigma=priors["beta_fourier"]["sigma"],
            dims="fourier_features",
        )

        #  AR process: We need one init variable for each lag, hence size is variable too
        init = pm.Normal.dist(
            priors["init"]["mu"], priors["init"]["sigma"], size=priors["init"]["size"]
        )
        # Steps of the AR model minus the lags required given specification
        ar1 = pm.AR(
            "ar",
            coefs,
            sigma=sigma,
            init_dist=init,
            constant=True,
            steps=t.shape[0] - (priors["coefs"]["size"] - 1),
            dims="obs_id",
        )
        # Trend Component
        trend = pm.Deterministic("trend", alpha + beta * t, dims="obs_id")

        # Seasonality Component
        fourier_terms = pm.MutableData("fourier_terms", ff)
        seasonality = pm.Deterministic(
            "seasonality", pm.math.dot(beta_fourier, fourier_terms), dims="obs_id"
        )
        ## Additive Combination
        mu = ar1 + trend + seasonality

        # The Likelihood
        outcome = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y, dims="obs_id")
        ## Sampling
        idata_ar = pm.sample_prior_predictive()

with AR:
        ## Data containers to enable prediction
        t = pm.MutableData("t", t_data, dims="obs_id")
        y = pm.MutableData("y", ar_data, dims="obs_id")
        # AR priors: The first coefficient will be the intercept term
        coefs = pm.Normal("coefs", priors["coefs"]["mu"], priors["coefs"]["sigma"])
        sigma = pm.HalfNormal("sigma", priors["sigma"])
        ## Priors for the linear trend component
        alpha = pm.Normal("alpha", priors["alpha"]["mu"], priors["alpha"]["sigma"])
        beta = pm.Normal("beta", priors["beta"]["mu"], priors["beta"]["sigma"])
        ## Priors for seasonality
        beta_fourier = pm.Normal(
            "beta_fourier",
            mu=priors["beta_fourier"]["mu"],
            sigma=priors["beta_fourier"]["sigma"],
            dims="fourier_features",
        )

        #  AR process: We need one init variable for each lag, hence size is variable too
        init = pm.Normal.dist(
            priors["init"]["mu"], priors["init"]["sigma"], size=priors["init"]["size"]
        )
        # Steps of the AR model minus the lags required given specification
        ar1 = pm.AR(
            "ar",
            coefs,
            sigma=sigma,
            init_dist=init,
            constant=True,
            steps=t.shape[0] - (priors["coefs"]["size"] - 1),
            dims="obs_id",
        )
        # Trend Component
        trend = pm.Deterministic("trend", alpha + beta * t, dims="obs_id")

        # Seasonality Component
        fourier_terms = pm.MutableData("fourier_terms", ff)
        seasonality = pm.Deterministic(
            "seasonality", pm.math.dot(beta_fourier, fourier_terms), dims="obs_id"
        )
        ## Additive Combination
        mu = ar1 + trend + seasonality

        # The Likelihood
        outcome = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y, dims="obs_id")
        ## Sampling
        idata_ar = pm.sample_prior_predictive()

with AR:
        ## Data containers to enable prediction
        t = pm.MutableData("t", t_data, dims="obs_id")
        y = pm.MutableData("y", ar_data, dims="obs_id")
        # AR priors: The first coefficient will be the intercept term
        coefs = pm.Normal("coefs", priors["coefs"]["mu"], priors["coefs"]["sigma"])
        sigma = pm.HalfNormal("sigma", priors["sigma"])
        ## Priors for the linear trend component
        alpha = pm.Normal("alpha", priors["alpha"]["mu"], priors["alpha"]["sigma"])
        beta = pm.Normal("beta", priors["beta"]["mu"], priors["beta"]["sigma"])
        ## Priors for seasonality
        beta_fourier = pm.Normal(
            "beta_fourier",
            mu=priors["beta_fourier"]["mu"],
            sigma=priors["beta_fourier"]["sigma"],
            dims="fourier_features",
        )

        #  AR process: We need one init variable for each lag, hence size is variable too
        init = pm.Normal.dist(
            priors["init"]["mu"], priors["init"]["sigma"], size=priors["init"]["size"]
        )
        # Steps of the AR model minus the lags required given specification
        ar1 = pm.AR(
            "ar",
            coefs,
            sigma=sigma,
            init_dist=init,
            constant=True,
            steps=t.shape[0] - (priors["coefs"]["size"] - 1),
            dims="obs_id",
        )
        # Trend Component
        trend = pm.Deterministic("trend", alpha + beta * t, dims="obs_id")

        # Seasonality Component
        fourier_terms = pm.MutableData("fourier_terms", ff)
        seasonality = pm.Deterministic(
            "seasonality", pm.math.dot(beta_fourier, fourier_terms), dims="obs_id"
        )
        ## Additive Combination
        mu = ar1 + trend + seasonality

        # The Likelihood
        outcome = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y, dims="obs_id")
        ## Sampling
        idata_ar = pm.sample_prior_predictive()
19

Prior Predictive and Posterior Predictive Distributions

1 2 3 4 5 6

Autoregressive BSTS prior and posterior fits. For details see PYMC docs here

20

Causal Inference with Interrupted Timeseries Designs

1 2 3 4 5 6

Causal Impact analysis for more detail see CausalPy

21

Vector Autoregressive Models

23

Multivariate Timeseries

1 2 3 4 5 6

  • Multivariate Linear Projection Solution applies giving
    • yT=ν+A1yT−1+A2yT−2...ApyT−p+etyT=ν+A1yT−1+A2yT−2...ApyT−p+et

    • Wold Decomposition:

      yT=μ+Ω1eT−1+Ω2eT−2...ΩpeT−pyT=μ+Ω1eT−1+Ω2eT−2...ΩpeT−p

  • Multivariate Auto-regressive representation with MV white noise error terms
    ⎡⎣⎢⎢gdpinvcon⎤⎦⎥⎥T=ν+A1⎡⎣⎢⎢gdpinvcon⎤⎦⎥⎥T−1+A2⎡⎣⎢⎢gdpinvcon⎤⎦⎥⎥T−2...Ap⎡⎣⎢⎢gdpinvcon⎤⎦⎥⎥T−p+et[gdpinvcon]T=ν+A1[gdpinvcon]T−1+A2[gdpinvcon]T−2...Ap[gdpinvcon]T−p+et
24

VAR(2)

1 2 3 4 5 6

A VAR can have n lags and m equations such that the lagged terms of each series appear in all equations.

gdpt=βgdp1⋅gdpt−1+βgdp2⋅gdpt−2+βcons1⋅const−1+βcons2⋅const−2+ϵgdpgdpt=βgdp1⋅gdpt−1+βgdp2⋅gdpt−2+βcons1⋅const−1+βcons2⋅const−2+ϵgdp
const=βcons1⋅const−1+βcons2⋅const−2+βgdp1⋅gdpt−1+βgdp2⋅gdpt−2+ϵconsconst=βcons1⋅const−1+βcons2⋅const−2+βgdp1⋅gdpt−1+βgdp2⋅gdpt−2+ϵcons

As such the coefficients on the cross-equation variables are of particular interest when investigating the influence of one series on another.

The MV IRF function can be created in an analogous way to the univariate timeseries function giving an interpretation of e.g the change in gdptgdpt due to a shock in const−2const−2.

This opens up the possibility of testing how influence between variables percolates over time.

25

Impulse Response

1 2 3 4 5 6

Statsmodels implementation of IRF

26

Vector Autoregressive Models in PYMC

28

Arranging the Coefficient Matrices

1 2 3 4 5 6

  • There are allot of indices and variables in a VAR model.
  • They can be concisely expressed in matrix notation, but it’s not intuitive (to me) programatically.
### Define a helper function that will construct our autoregressive step for the marginal contribution of each lagged
### term in each of the respective time series equations
def calc_ar_step(lag_coefs, n_eqs, n_lags, df):
    ars = []
    for j in range(n_eqs):
        ar = pm.math.sum(
            [
                pm.math.sum(lag_coefs[j, i] * df.values[n_lags - (i + 1) : -(i + 1)], axis=-1)
                for i in range(n_lags)
            ],
            axis=0,
        )
    ars.append(ar)
    betaY = pm.math.stack(ars, axis=-1)

    return betaY
29

The Broad Structure

1 2 3 4 5 6

VAR(2)

30

Implementation in Code

1 2 3 4 5 6

with pm.Model(coords=coords) as model:
        lag_coefs = pm.Normal(
            "lag_coefs",
            mu=priors["lag_coefs"]["mu"],
            sigma=priors["lag_coefs"]["sigma"],
            dims=["equations", "lags", "cross_vars"],
        )
        alpha = pm.Normal(
            "alpha", mu=priors["alpha"]["mu"], sigma=priors["alpha"]["sigma"], dims=("equations",)
        )
        data_obs = pm.Data("data_obs", df.values[n_lags:], dims=["time", "equations"], mutable=True)

        betaX = calc_ar_step(lag_coefs, n_eqs, n_lags, df)
        betaX = pm.Deterministic("betaX", betaX, dims=["time",])
        mean = alpha + betaX

        if mv_norm:
            n = df.shape[1]
            ## Under the hood the LKJ prior will retain the correlation matrix too.
            noise_chol, _, _ = pm.LKJCholeskyCov(
                "noise_chol",
                eta=priors["noise_chol"]["eta"],
                n=n,
                sd_dist=pm.HalfNormal.dist(sigma=priors["noise_chol"]["sigma"]),
            )
            obs = pm.MvNormal(
                "obs", mu=mean, chol=noise_chol, observed=data_obs, dims=["time", "equations"]
            )
31

Applying the Theory

1 2 3 4 5 6

Macroeconomic Data

32

The Special Case of Ireland

1 2 3 4 5 6

Ireland’s Posterior Predictive Checks

Table 1: Posterior Mean Correlation
GDP CONS
GDP 1 0.43
CONS 0.43 1
33

Comparison with Statsmodels MLE fit

1 2 3 4 5 6

Compare

34

Convergence Checks

1 2 3 4 5 6

Sampling Chains

35

Hierarchical Bayesian VARs

“Those who only know one country, know no country” – Seymour Martin Lipset

37

Pooling, Shrinkage and Hierarchy

1 2 3 4 5 6

Shrinkage to Centre Effect

38

How to model Hierarchical Structure

1 2 3 4 5 6

groups = df[group_field].unique()

    with pm.Model(coords=coords) as model:
        ## Hierarchical Priors
        rho = pm.Beta("rho", alpha=2, beta=2)
        alpha_hat_location = pm.Normal("alpha_hat_location", 0, 0.1)
        alpha_hat_scale = pm.InverseGamma("alpha_hat_scale", 3, 0.5)
        beta_hat_location = pm.Normal("beta_hat_location", 0, 0.1)
        beta_hat_scale = pm.InverseGamma("beta_hat_scale", 3, 0.5)
        omega_global, _, _ = pm.LKJCholeskyCov(
            "omega_global", n=n_eqs, eta=1.0, sd_dist=pm.Exponential.dist(1)
        )
        ## Group Specific Parameter
        for grp in groups:
            df_grp = df[df[group_field] == grp][cols]
            z_scale_beta = pm.InverseGamma(f"z_scale_beta_{grp}", 3, 0.5)
            z_scale_alpha = pm.InverseGamma(f"z_scale_alpha_{grp}", 3, 0.5)
            lag_coefs = pm.Normal(
                f"lag_coefs_{grp}",
                mu=beta_hat_location,
                # Hierarchical offset
                sigma=beta_hat_scale * z_scale_beta,
                dims=["equations", "lags", "cross_vars"],
            )
            alpha = pm.Normal(
                f"alpha_{grp}",
                mu=alpha_hat_location,
                # Hierarchical Offset
                sigma=alpha_hat_scale * z_scale_alpha,
                dims=("equations",),
            )

            betaX = calc_ar_step(lag_coefs, n_eqs, n_lags, df_grp)
            betaX = pm.Deterministic(f"betaX_{grp}", betaX)
            mean = alpha + betaX

            n = df_grp.shape[1]
            noise_chol, _, _ = pm.LKJCholeskyCov(
                f"noise_chol_{grp}", eta=10, n=n, sd_dist=pm.Exponential.dist(1)
            )# Hierarchical Offset
            omega = pm.Deterministic(f"omega_{grp}", rho * omega_global + (1 - rho) * noise_chol)
            obs = pm.MvNormal(f"obs_{grp}", mu=mean, chol=omega, observed=df_grp.values[n_lags:])
39

Ireland is an Outlier

1 2 3 4 5 6

Country Specific Estimates

40

The Global Correlation Structure

1 2 3 4 5 6

Global Correlations

41

The Posterior Predictive Fits

1 2 3 4 5 6

Posterior Predictive Fits by country. See PYMC docs here

42

Wrap Up

  • Bayesian Timeseries models are flexible and transparent tools for forecasting and causal inference.
  • VAR models can help interrogate questions of the relationships between timeseries data.
  • Bayesian VAR models allow the specification of regularising priors and encode structural assumptions about direction of influence.
  • Hierarchical Bayesian VAR models offer the chance to borrow information across groups and interrogate the questions about the relationships between timeseries within and across the groups.
  • In the context with sparse or short timeseries the hierarchical model can augment our understanding of the lagged relationships of influence.
43
31 / 37

  1. Slides

  2. Tools

  3. Close
  • Tenuous Relations and Timeseries Analysis
  • Agenda
  • Motivation: Background Problem
  • Disclaimers and Acknowledgements
  • Systems and Feedback: App Performance
  • Disentangling Individual effects
  • Identifying the Channel of Transmission
  • Lagging Effects in Time
  • Structural Autoregressive Models
  • Autogressive Models
  • Moving Average Representation and Impulse Response
  • Stationarity Requirements
  • Structural Autoregressive Models
  • Characterising Structure
  • Structural Autoregressive Models in PYMC
  • Bayesian Structural Timeseries
  • Prior Predictive and Posterior Predictive Distributions
  • Causal Inference with Interrupted Timeseries Designs
  • Vector Autoregressive Models
  • Multivariate Timeseries
  • VAR(2)
  • Impulse Response
  • Vector Autoregressive Models in PYMC
  • Arranging the Coefficient Matrices
  • The Broad Structure
  • Implementation in Code
  • Applying the Theory
  • The Special Case of Ireland
  • Comparison with Statsmodels MLE fit
  • Convergence Checks
  • Hierarchical Bayesian VARs
  • Pooling, Shrinkage and Hierarchy
  • How to model Hierarchical Structure
  • Ireland is an Outlier
  • The Global Correlation Structure
  • The Posterior Predictive Fits
  • Wrap Up
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • ? Keyboard Help

Error